Introduction

Hui Lin @Netlify

Ming Li @Amazon

2020-06-21

Course Website

https://course2020.scientistcafe.com/

The term no one really defined

Data science is the discipline of making data useful. Ok…so what is it?

HTML5 Icon

Every company claims to be data driven, but they are different…

Every company claims to be data driven, but they are different…

Excerpt from How Airbnb Democratizes Data Science With Data University:

Three tracks of data science

(It is a group work from https://github.com/brohrer/academic_advisory/blob/master/authors.md !)

Engineering

  1. Data environment: data storage, Kafka platform, Hadoop and Spark cluster etc.

  2. Data management: parsing the logs, web scraping, API queries, and interrogating data streams.

  3. Production: integrate model and analysis into the production system

Analysis

  1. Domain knowledge

  2. Exploratory analysis

  3. Story telling

Modeling

  1. Supervised learning

  2. Unsupervised learning

  3. Customized model development

General Process of Modeling/Analytics

Some confusions and more to come

Three tracks of data science

HTML5 Icon

Three tracks of data science

HTML5 Icon

Three tracks of data science

HTML5 Icon

Types of Questions (Modeling/Analytics)

Types of Questions (Modeling/Analytics)

Data Science Types v.s Needs

HTML5 Icon

Data Flow

HTML5 Icon

Data Science Roles

Startup v.s. Mature company (Pre IPO/Public)

HTML5 Icon

Startup v.s. Mature company (Pre IPO/Public)

HTML5 Icon